Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.
translated by 谷歌翻译
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal data (e.g. audiovisual) have different distributions and representations that cannot be directly compared. To bridge the gap between audiovisual modalities, we learn a common subspace for them by utilizing the intrinsic correlation in the natural synchronization of audio-visual data with the aid of annotated labels. TNN-CCCA is the best audio-visual cross-modal retrieval (AV-CMR) model so far, but the model training is sensitive to hard negative samples when learning common subspace by applying triplet loss to predict the relative distance between inputs. In this paper, to reduce the interference of hard negative samples in representation learning, we propose a new AV-CMR model to optimize semantic features by directly predicting labels and then measuring the intrinsic correlation between audio-visual data using complete cross-triple loss. In particular, our model projects audio-visual features into label space by minimizing the distance between predicted label features after feature projection and ground label representations. Moreover, we adopt complete cross-triplet loss to optimize the predicted label features by leveraging the relationship between all possible similarity and dissimilarity semantic information across modalities. The extensive experimental results on two audio-visual double-checked datasets have shown an improvement of approximately 2.1% in terms of average MAP over the current state-of-the-art method TNN-CCCA for the AV-CMR task, which indicates the effectiveness of our proposed model.
translated by 谷歌翻译
室外(OOD)检测是面向任务的对话框系统中的关键组件,旨在确定查询是否不在预定义的支持的意图集之外。事实证明,先前基于软磁性的检测算法对OOD样品被过度自信。在本文中,我们分析了过度自信的OOD来自由于训练和测试分布之间的不匹配而导致的分布不确定性,这使得该模型无法自信地做出预测,因此可能导致异常软磁得分。我们提出了一个贝叶斯OOD检测框架,以使用Monte-Carlo辍学来校准分布不确定性。我们的方法是灵活的,并且可以轻松地插入现有的基于软磁性的基线和增益33.33 \%OOD F1改进,而与MSP相比仅增加了0.41 \%的推理时间。进一步的分析表明,贝叶斯学习对OOD检测的有效性。
translated by 谷歌翻译
传统意图分类模型基于预定义的意图集,仅识别有限的内域(IND)意图类别。但是用户可以在实用的对话系统中输入室外(OOD)查询。这样的OOD查询可以提供未来改进的方向。在本文中,我们定义了一项新任务,广义意图发现(GID),旨在将IND意图分类器扩展到包括IND和OOD意图在内的开放世界意图集。我们希望在发现和识别新的未标记的OOD类型的同时,同时对一组标记的IND意图类进行分类。我们为不同的应用程序方案构建了三个公共数据集,并提出了两种框架,即基于管道的框架和端到端,以实现未来的工作。此外,我们进行详尽的实验和定性分析,以理解关键挑战,并为未来的GID研究提供新的指导。
translated by 谷歌翻译
面价/唤醒,表达和动作单元是面部情感分析中的相关任务。但是,由于各种收集的条件,这些任务仅在野外的性能有限。野外情感行为分析的第四次竞争(ABAW)提供了价值/唤醒,表达和动作单元标签的图像。在本文中,我们介绍了多任务学习框架,以增强野外三个相关任务的性能。功能共享和标签融合用于利用它们的关系。我们对提供的培训和验证数据进行实验。
translated by 谷歌翻译
从合成图像中学习由于标记真实图像的困难而在面部表达识别任务中起着重要作用,并且由于合成图像和真实图像之间存在差距而具有挑战性。第四次情感行为分析在野外竞争增加了挑战,并提供了Aff-Wild2数据集生成的合成图像。在本文中,我们提出了一种手工辅助表达识别方法,以减少合成数据和真实数据之间的差距。我们的方法由两个部分组成:表达识别模块和手部预测模块。表达识别模块提取表达信息,并预测模块预测图像是否包含手。决策模式用于结合两个模块的结果,并使用后延伸来改善结果。F1分数用于验证我们方法的有效性。
translated by 谷歌翻译
本文介绍了一种基于单模态语义分割的新型坑洞检测方法。它首先使用卷积神经网络从输入图像中提取视觉特征。然后,通道注意力模块重新引起通道功能以增强不同特征映射的一致性。随后,我们采用了一个不足的空间金字塔汇集模块(包括串联循环升级的不足卷积)来整合空间上下文信息。这有助于更好地区分坑洼和未损害的道路区域。最后,相邻层中的特征映射使用我们提出的多尺度特征融合模块融合。这进一步降低了不同特征通道层之间的语义间隙。在Pothole-600数据集上进行了广泛的实验,以证明我们提出的方法的有效性。定量比较表明,我们的方法在RGB图像和变换的差异图像上实现了最先进的(SOTA)性能,优于三个SOTA单模语义分段网络。
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time.
translated by 谷歌翻译
Neural Architecture Search (NAS) is an automatic technique that can search for well-performed architectures for a specific task. Although NAS surpasses human-designed architecture in many fields, the high computational cost of architecture evaluation it requires hinders its development. A feasible solution is to directly evaluate some metrics in the initial stage of the architecture without any training. NAS without training (WOT) score is such a metric, which estimates the final trained accuracy of the architecture through the ability to distinguish different inputs in the activation layer. However, WOT score is not an atomic metric, meaning that it does not represent a fundamental indicator of the architecture. The contributions of this paper are in three folds. First, we decouple WOT into two atomic metrics which represent the distinguishing ability of the network and the number of activation units, and explore better combination rules named (Distinguishing Activation Score) DAS. We prove the correctness of decoupling theoretically and confirmed the effectiveness of the rules experimentally. Second, in order to improve the prediction accuracy of DAS to meet practical search requirements, we propose a fast training strategy. When DAS is used in combination with the fast training strategy, it yields more improvements. Third, we propose a dataset called Darts-training-bench (DTB), which fills the gap that no training states of architecture in existing datasets. Our proposed method has 1.04$\times$ - 1.56$\times$ improvements on NAS-Bench-101, Network Design Spaces, and the proposed DTB.
translated by 谷歌翻译